A Study on Combining VTLN and SAT to Improve the Performance of Automatic Speech Recognition

نویسندگان

  • Doddipatla Rama Sanand
  • Mikko Kurimo
چکیده

In this paper, we present ideas to combine VTLN and SAT to improve the performance of automatic speech recognition. We show that VTLN matrices can be used as SAT transformation matrices in recognition, though the training still follows conventional SAT. This will be useful when there is very little adaptation data and the SAT transformation matrix can not be estimated to perform the required adaptation. We also present a study to understand whether VTLN can be performed after SAT and whether such a combination is better than the conventional approach, where VTLN is performed before SAT. Finally, we present a novel approach to perform VTLN by using VTLNmatrices in cascade. This allows us to include warping-factors that are not included in the initial search space. We show through recognition experiments that these combinations improve the performance of ASR, with major gains in the mis-matched train and test speaker conditions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices

Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...

متن کامل

Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition

In this paper, we present an approach to generate speaker invariant features for automatic speech recognition (ASR) using the idea of spectral centre of gravity(CG). This is based on the observation that if two signals are delayed versions of one another, then their CG’s also differ by the same amount. We exploit this idea to appropriately shift the mel warped log compressed spectra using the e...

متن کامل

تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت

The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...

متن کامل

A Study on Speaker Normalized MLP Features in LVCSR

Different normalization methods are applied in recent Large Vocabulary Continuous Speech Recognition Systems (LVCSR) to reduce the influence of speaker variability on the acoustic models. In this paper we investigate the use of Vocal Tract Length Normalization (VTLN) and Speaker Adaptive Training (SAT) in Multi Layer Perceptron (MLP) feature extraction on an English task. We achieve significant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011